Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

missing the first doc record. #60

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

Conversation

inetufo
Copy link

@inetufo inetufo commented Feb 27, 2017

missing the first doc record.

missing the first doc record bug
missing the first doc record
@inetufo
Copy link
Author

inetufo commented Feb 27, 2017

@phutchins The first doc record is not handled, pls review this pr and merge to master to release a new gem :)

@phutchins
Copy link
Owner

@inetufo can you give a little more detail around what you're changing here? Seems like there's more going on than catching the missing first doc.

It looks like you're adding a range instead of a starting point. The current behavior is to check if we have a placeholder. If we don't have one, start at the beginning of the collection and process all entries waiting for more. If we do have a placeholder, we start there and continue processing all entries waiting for more.

Copy link
Owner

@phutchins phutchins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @inetufo thanks for taking the time to work on this. I've added a few comments and requested a few changes. If you can update to make these changes we can get this merged.

collection = mongodb.collection(mongo_collection_name)
# Need to make this sort by date in object id then get the first of the series
# db.events_20150320.find().limit(1).sort({ts:1})
if first_entry_id_object == last_id_object
@logger.info("get_cursor_for_collection first entry:#{first_entry_id_object}")
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might should be debug level logging.

@@ -144,10 +144,15 @@ def get_collection_names(mongodb, collection)
end

public
def get_cursor_for_collection(mongodb, mongo_collection_name, last_id_object, batch_size)
def get_cursor_for_collection(mongodb, mongo_collection_name, first_entry_id_object, last_id_object, batch_size)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably rename these variables as they're a bit confusing.

Maybe...
first_entry_id_object => processing_start_entry_id_object
last_id_object => last_unprocessed_entry_id_object (or maybe first_unprocessed_entry_id_object)

... or something like that.

@@ -112,7 +112,7 @@ def get_placeholder(sqlitedb, since_table, mongodb, mongo_collection_name)
if x[:place].nil? || x[:place] == 0
first_entry_id = init_placeholder(sqlitedb, since_table, mongodb, mongo_collection_name)
@logger.debug("FIRST ENTRY ID for #{mongo_collection_name} is #{first_entry_id}")
return first_entry_id
return [first_entry_id, first_entry_id]
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this should return the placeholder or nil and we can wrap it with a get_processing_start_id() or something that returns us the palceholder or the beginning of the db.

@@ -248,7 +265,8 @@ def run(queue)
last_id_object = Time.at(last_id)
end
end
cursor = get_cursor_for_collection(@mongodb, collection_name, last_id_object, batch_size)

cursor = get_cursor_for_collection(@mongodb, collection_name, first_entry_id_object, last_id_object, batch_size)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather only pass get_cursor_for_collection the starting point and batch_size and use a separate method as mentioned above to work out what that should be.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, when are you going to merge this branch? I really need it, thank you.

@TaurusD
Copy link

TaurusD commented Aug 4, 2017

Hi, this is a real bug need to be solved. Please refer to #71

@hadeer-e
Copy link

@inetufo can you give a little more detail around what you're changing here? Seems like there's more going on than catching the missing first doc.

It looks like you're adding a range instead of a starting point. The current behavior is to check if we have a placeholder. If we don't have one, start at the beginning of the collection and process all entries waiting for more. If we do have a placeholder, we start there and continue processing all entries waiting for more.

I added a placeholder and also it removes the first document of data !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants